NSF PAR Search | NSF Public Access Repository

Graph convolutional network (GCN) has been shown effective in many applications with graph structures. However, training a large-scale GCN is still challenging due to the high computation cost that grows with the size of the graph. In this paper, we propose CM-GCN, a distributed GCN framework using cohesive mini-batches to accelerate large-scale GCN training. The cohesive mini-batches group nodes that are tightly connected in the graph. As a result, CM-GCN can reduce the computation required to train a GCN. We propose a computation cost function to quantify the computation required for mini-batches. By exploring the submodular property of the computation cost function, we develop an efficient algorithm to partition nodes into tightly coupled mini-batches. Based on the computation cost function, we evenly distribute the workloads of mini-batches to workers. We design asynchronous computations between GCN layers to further eliminating the waiting among workers. We implement a CM-GCN framework and evaluate its performance with graphs that contain millions of nodes. Our evaluation shows that CM-GCN can achieve up to 3X speedup without compromising the training accuracy.

A Proactive Data-Parallel Framework for Machine Learning

https://doi.org/10.1145/3492324.3494167

Zhao, Guoyi; Zhou, Tian; Gao, Lixin (December 2021, IEEE/ACM 8th International Conference on Big Data Computing, Applications and Technologies (BDCAT))

Data parallel frameworks become essential for training machine learning models. The classic Bulk Synchronous Parallel (BSP) model updates the model parameters through pre-defined synchronization barriers. However, when a worker computes significantly slower than other workers, waiting for the slow worker will lead to excessive waste of computing resources. In this paper, we propose a novel proactive data-parallel (PDP) framework. PDP enables the parameter server to initiate the update of the model parameter. That is, we can perform the update at any time without pre-defined update points. PDP not only initiates the update but also determines when to update. The global decision on the frequency of updates will accelerate the training. We further propose asynchronous PDP to reduce the idle time caused by synchronizing parameter updates. We theoretically prove the convergence property of asynchronous PDP. We implement a distributed PDP framework and evaluate PDP with several popular machine learning algorithms including Multilayer Perceptron, Convolutional Neural Network, K-means, and Gaussian Mixture Model. Our evaluation shows that PDP can achieve up to 20X speedup over the BSP model and scale to large clusters.

Full Text Available

Search for: All records